perf: avoid O(N^2) exiting-branch checks in CodeFolding by Changqing-JING · Pull Request #8599 · WebAssembly/binaryen

Changqing-JING · 2026-04-14T03:38:13Z

Follow up PR of #8586 to optimize CodeFolding

optimizeTerminatingTails calls EffectAnalyzer per tail item, each walking the full subtree. On deeply nested blocks this is O(N^2).

Replace the per-item walks with a single O(N) bottom-up PostWalker (populateExitingBranchCache) that pre-computes exiting-branch results for every node, making subsequent lookups O(1).

Example: AssemblyScript GC compiles __visit_members as a br_table dispatch over all types, producing ~N nested blocks with ~N tails. The old code walks each tail's subtree separately -- O(N^2) total node visits. With this change, one bottom-up walk covers all nodes, then each tail lookup is O(1).

(block $A          ;; depth 4000
  (block $B        ;; depth 3999
    (block $C      ;; depth 3998
      ...
      (br_table $A $B $C ... (local.get $rtid))
    )
    (unreachable)  ;; tail at depth 3999, old code walks 3999 nodes
  )
  (unreachable)    ;; tail at depth 4000, old code walks 4000 nodes
)

benchmark data
The test module is from issue #7319
#7319 (comment)

In main head

time ./build/bin/wasm-opt -Oz --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling  -o /dev/null ./test3.wasm

real    9m16.111s
user    35m33.985s
sys     0m51.000s

In the PR

time ./build/bin/wasm-opt -Oz --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling  -o /dev/null ./test3.wasm

real    5m17.170s
user    30m9.198s
sys     0m28.030s

kripken · 2026-04-15T19:05:16Z

+  // efficient bottom-up traversal.
+  bool hasExitingBranches(Expression* expr) {
+    if (!exitingBranchCachePopulated_) {
+      populateExitingBranchCache(getFunction()->body);


Looks like this still scans the entire function. I suggest that we only scan expr itself. That will still avoid re-computing things, but avoid scanning things that we never need to look at.

This does require that the cache store a bool, so we know if we scanned or not, and if we did, if we found branches out or not. But I think that is worth it - usually we will scan very few things.

The per-expression cache would still be O(N^2) in the nested block case. AssemblyScript GC emits __visit_members with deeply nested blocks + br_table, where the nesting level equals the number of classes (4000+ in real apps). Each nested block gets queried by optimizeTerminatingTails, and each query walks its overlapping subtree independently, giving O(N + (N-1) + ... + 1) = O(N^2) total work even with the cache.

We also cannot reuse a child's cached bool to compute a parent's result, because knowing "child has exiting branches" does not tell us which names exit -- the parent may define/resolve some of them. To compose results bottom-up, we would need to store the full set of unresolved names per expression. I benchmarked that approach (storing unordered_map<Expression*, unordered_set> and propagating name sets upward), but the per-node set allocation overhead on millions of nodes made -Oz significantly slower than the baseline (~13min vs ~5min).

The whole-function scan avoids both issues by computing all results in a single O(N) pass using only integer counters, with no per-node name storage.

Changqing-JING requested a review from a team as a code owner April 14, 2026 03:38

Changqing-JING requested review from kripken and removed request for a team April 14, 2026 03:38

Changqing-JING marked this pull request as draft April 14, 2026 03:38

Changqing-JING mentioned this pull request Apr 14, 2026

wasm-opt -Oz takes an inordinate amount of time #7319

Open

avoid O(N^2) exiting-branch checks in CodeFolding

66dff99

Changqing-JING force-pushed the opt/compile-speed branch from 1dae3f3 to 66dff99 Compare April 14, 2026 04:27

Changqing-JING marked this pull request as ready for review April 14, 2026 04:59

Changqing-JING mentioned this pull request Apr 14, 2026

[NFC] cache repeated tree walks to avoid O(N^2) in optimizeTerminatingTails in CodeFolding #8602

Open

kripken reviewed Apr 14, 2026

View reviewed changes

Comment thread src/passes/CodeFolding.cpp Outdated

Fix review

f263f08

Changqing-JING force-pushed the opt/compile-speed branch from daf81f7 to f263f08 Compare April 15, 2026 04:41

Changqing-JING requested a review from kripken April 15, 2026 05:41

kripken reviewed Apr 15, 2026

View reviewed changes

Fix review

b90aee7

Changqing-JING requested a review from kripken April 16, 2026 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599

perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599
Changqing-JING wants to merge 3 commits intoWebAssembly:mainfrom
Changqing-JING:opt/compile-speed

Changqing-JING commented Apr 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kripken Apr 15, 2026

Uh oh!

Changqing-JING Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Changqing-JING commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kripken Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Changqing-JING Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Changqing-JING commented Apr 14, 2026 •

edited

Loading